15 - Pattern Recognition [PR] - PR 12 [ID:22786]
50 von 98 angezeigt

Welcome back to Pattern Recognition. So today we want to continue thinking about discriminant

modeling in feature transforms and we had this idea of doing essentially a class-wise

normalization in our feature space and if we were to do so then certain properties in

this feature space would emerge that we want to have a look on today.

So this is essentially the pathway towards linear discriminant analysis. We have some input training

data that is now not just associated with feature vectors but we also have the class labels and if

we had the class labels then we could essentially apply the transform as we already talked about.

So let's see how we could apply this in order to find a transform that models the different

distributions of the classes differently. Well the first thing that we need to do is we need to figure

out the joint covariance matrix. So here we are actually looking into all of the observations so

we compute just a single covariance matrix over the entire set but of course we regard the class

membership by essentially normalizing with different means. So we compute the means for

the different classes and then we compute a joint covariance for all of the feature vectors that

essentially computes the variance with respect to all of the observations and they are of course

compared to their respective class means. So this gives us a joint covariance matrix

sigma hat and now we can use our trick of sigma hat that can be decomposed into UDU transpose

and this then allows us the definition of a transform phi that will enable us to map into

this normalized space and this normalized space would then be given as D to the power of minus

0.5 times U transpose. If we do so then we can apply this again on our means and we get essentially

the normalized means and these normalized means then would be given as mu prime and mu prime is

essentially simply the application of the transform. So now we have the feature transform phi and the

transform mean vectors mu prime. Now let's look into the actual decision rule on this

sphered data as you could say and here then we would have our y star that is again maximizing

the posterior probability. Now we use again the trick of decomposing the posterior into the prior

and the class conditionals. Here we already know that we're going to use a Gaussian so we apply

the logarithm and get rid of all the terms that are not changing the decision. So everything that is

essentially independent of x has been removed in the right hand term and then we see we just end up

with the exponent of the Gaussian distribution and also we see that in this case we have an

identity matrix as covariance so this one cancels out and we just have the feature transform in here

and what we see if we regard this then we can essentially remap this inner product into an L2

norm so we essentially are looking here into a comparison in a normalized space that is

essentially computing the Euclidean distance between those normalized points and of course

there is still some influence given the class prior which would essentially cancel out in the

decision rule if we had the same priors for all of the classes. So some conclusions if all of the

classes share the same prior this is nothing else than the nearest neighbor decision rule where the

transformed mean vectors are used as prototypes so we simply choose the class where we are transformed

to the closest. However also note that the feature transform phi here does not change the dimension

of the features so here we essentially just have a rotation and scaling that is applied and this is

steered by the global covariance. Now let's think about whether this makes sense or not and I have

a geometric argument here so here we show the two means so let's consider the case of two classes

and if I transform everything then I'm in this transformed space so everything here is

essentially mapped by phi so this is why I have phi of x and phi of mu1 and phi of mu0.

Now I can connect the two class centers so mu0 or phi of mu0 and phi of mu1 and this gives us the

connection a so a is the vector that is connecting the two class centers and it's determined simply

as the subtraction of the two. This is the information that is really relevant for deciding

the class so remember our decision boundaries in this normalized space are going to be lines.

Now if I have this connection between the two then for the classification of an arbitrary

feature vector it actually doesn't make any difference anymore whether it's being moved

off this decision boundary so we can essentially take an arbitrary vector x and then we can compute

the difference between phi of x and phi of mu0 and you see that it doesn't matter how we move it

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:12:13 Min

Aufnahmedatum

2020-11-04

Hochgeladen am

2020-11-04 12:18:22

Sprache

en-US

In this video, we look into some useful properties of discriminant analyses.

This video is released under CC BY 4.0. Please feel free to share and reuse.

For reminders to watch the new video follow on Twitter or LinkedIn. Also, join our network for information about talks, videos, and job offers in our Facebook and LinkedIn Groups.

Music Reference: Damiano Baldoni - Thinking of You

Einbetten
Wordpress FAU Plugin
iFrame
Teilen